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Some  Evolving  Conventions  and  Standards  for 
Character  Information  Coded  in  Six,  Seven,  and  Eight  Bits 


John  L0  Little 

ABSTRACT 

This  Technical  Note  describes  some  of  the  properties  of  the  USA 
Standard  Code  for  Information  Interchange,  widely  known  as  ASCII 
or  USASCIIo   It  also  relates  this  code  to  similar  international 
codes  and  presents  the  national  code  for  Russia  as  an  example  of  the 
worldwide  acceptance  of  this  code,  which  is  comprised  of  seven 
information  bits  representing  128  coded  characters.   Some  of  the 
conventions  which  are  evolving  to  relate  these  seven  bit  codes  to 
six  and  eight  bit  computer  codes  are  given,  and  conventions  for 
extending  the  code  to  represent  an  unlimited  repertoire  of  concepts 
is  also  given0  Two  alternate  arrangements  of  the  code  table  are 
shown  to  facilitate  an  understanding  of  its  structure  and  application.. 

Keywords:  ASCII;  character  codes;  code  extension;  coded  character 
sets;  collating  sequence;  computer  codes;  data  transmission  codes; 
information  code  structure;  international  information  code;  Russian 
code  standard;  standard  information  codes;  USASCII. 
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1.  The  Evolution  of  the  USA  Standard 
Code  for  Information  Interchange 

Coding  of  character-oriented  information  into  binary  bit  patterns  is 

as  old  as  automatic  telegraphy,  or  about  a  century.  Perforated  tape 

and  punched  cards  have  been  in  use  since  about  the  turn  of  the  century. 

Magnetic  wire,  tape,  cores,  drums  and  disks  have  come  into  widespread 

use  in  conjunction  with  automatic  recording  of  data  and  processing  by 

digital  computers.  Independent  development  of  such  equipment  resulted 

in  a  chabtic  variety  of  conventions,  and  practices,  for  the  physical 

techniques  as  well  as  the  logical  coding  patterns. 

Punched  card  codes,  usually  attributed  to  Herman  Hollerith,  represented 
originally  numbers  and  signs,  later  letters  and  still  later  special 
symbols.  The  evolution  always  retained  historical  features  of  earlier 
structure,  and  these  in  turn  influenced  the  coding  of  numbers,  letters 
and  symbols  in  many  electronic  computers  developed  much  later  than 
the  card  codes.  Although  the  grouping  of  characters  in  card  codes 
influenced  the  grouping  of  characters  in  computer  codes,  the  card  codes 
were  never  carried  over  and  used  internally  in  computers,  since  they 
are  not  well  suited  for  that  purpose.  Minor  variations  in  the  special 
symbol  codes  used  in  punched  cards  is  still  creating  much  confusion, 
but  standardization  is  being  attempted. 

Perforated  tape  and  magnetic  tape  grew  and  proliferated  in  much  greater 
variety  of  widths,  recording  densities  and  coding  schemes.   Standardization 
is  attempting  to  bring  order  out  of  the  chaos. 


Magnetic  disk  packs  are  currently  proliferating,  and  the  ckaos  is 

somewhat  in  evidence,  but  relatively  early  attempts  are  being  made 

to  provide  interchange  ability  of  disk  packs  in  order  to  increase 

their  utility  to  users.  This  requires  standards  for  physical  dimensions, 

magnetic  properties,  track  layout,  recording  techniques  and  density, 

and  coding  of  character  information  as  well  as  automatic  labels  for 

identification  and  data  structure  or  format  specification. 

By  i960  the  chaos  of  data  representation  had  reached  such  proportions 
that  a  committee  on  computers  and  information  processing,  known  as 
X3>  was  established  by  the  American  Standards  Association,  Among  its 
goals  was  the  establishment  of  a  standard  code  for  the  representation 
of  character-oriented  information,  and  standard  implementations  of  tnat 
code  on  the  various  media,  such  as  magnetic  tape,  perforated  tape, 
punched  cards,  and  electrical  communication. 

The  complexity  of  the  problem  and  the  many  conflicting  requirements 
facing  the  subcommittee  X3.2,  known  as  Codes  and  Input/Output,  posed 
an  awesome  task,  requiring  such  diverse  considerations  as  collating 
sequence,  keyboard  conventions,  provision  of  an  adequate  assortment 
of  graphic  symbols,  format  effectors,  communication  controls,  device 
controls,  and  information  separators,  as  well  as  appropriate  subsets 
for  use  in  smaller  coded  character  sets,  and  provisions  for  expansion 
to  encompass  an  unlimited  repertoire  of  concepts  not  explicitly 
embedded  in  the  code  proper,,   These  considerations  and  their  resolution 


as  reflected  in  the  code  are  given  in  greater  detail  in  the  appendices 
to  the  standard  (References  1  and  12). 

The  initials  ASCII  (pronounced  ask'-ee)  standing  for  "American  Standard 
Code  for  Information  Interchange"  as  well  as  USASCII  (pronounced  you- 
sass-key)  standing  for  "USA  Standard  Code  for  Information  Interchange" 
have  become  familiar  by  now  in  many  circles  dealing  with  automated 
information  in  coded  form.  The  code  is  the  United  States'  national 
version  of  an  international  code  adopted  by  several  international 
standardizing  bodies  and  manufacturing  associations.  Many  countries 
have  adopted  their  own  national  versions  of  this  code  and  have  given 
it  a  mandatory  legal  status.   The  United  States  Government  has  adopted 
it  as  a  Federal  Information  Processing  Standard  (FTPS-l)  and  it  will 
become  a  common  requirement  in  many  government-supported  and  sponsored 
systems:  federal,  state,  and  local. 

2.  Structure  of  Seven  Bit  ASCII 
ASCII  is  a  seven  bit  code  having  128  coded  character  positions,  as 
shown  in  Figure  1.   The  seven  bits  are  designated  bl  through  b7  from 
low  order  to  high  order  of  significance.  The  eight  columns  are  numbered 
0  through  7  corresponding  to  the  octal  values  of  the  three  high  order 
bits  b5,  b6,  and  b7.   The  sixteen  rows  are  numbered  0  through  1$   in 
accordance  with  the  four  low  order  bits  bl,  b2,  b3,  and  blj..  The  code 
extends  from  seven  "zero"  bits,  the  Null  character  (ML)  to  seven  "one" 
bits,  the  Delete  character  (DEL).  Columns  0  and  1  contain  32  control 
characters  and  columns  2  through  7  contain  95  graphic  characters  and  the  Delete 
character. 


The  meanings  of  the  two-or  three-letter  groups  in  the  ASCII  code 

table  are  given  below.  Definitions  of  these  functions  and  names 

of  the  single-symbol,  graphic  characters  are  given  in  the  standard 

(Reference  1) . 

NUL   Null 

SOH   Start  of  Heading 

STX   Start  of  Text 

ETX   End  of  Text 

EOT   End  of  Transmission 

ENQ   Enquiry 

ACK   Acknowledge 

BEL   Bell  (Audible  or  attention  signal) 

BS    Backspace 

HT    Horizontal  Tabulation  (punched  card  skip)  EM   End  of  Medium 

LF    Line  Feed  SUB  Substitute 

VT    Vertical  Tabulation  ESC  Escape 

FF    Form  Feed  FS   File  Separator 

CR    Carriage  Return  GS 

50  Shift  Out  RS 

51  Shift  In  US 

SP 

DEL  Delete 
ASCII  is  frequently  referred  to  as  "an  eight  level  code."  This  is 
usually  due  to  the  addition  of  a  parity  bit  in  paper  tape  and  a  parity 


DLE  Data  Link  Escape 

DC1  Device  Control  1 

DC2  Device  Control  2 

DC3  Device  Control  3 

DCU  Device  Control  k    (STOP) 

NAK  Negative  Acknowledge 

SYN  Synchronous  Idle 

ETB  End  of  Transmission  Block 

CAN  Cancel 


Group  Separator 
Record  Separator 
Unit  Separator 
Space 


bit  included  in  the  character  structure  of  communication  systems. 
The  parity  bit  is  not  included  in  the  code  standard  (Reference  1)  but 
its  inclusion  and  sense  (odd  or  even)  are  basic  to  certain  other 
standards  which  specify  the  implementation  of  the  code  in  media  or  in 
communication.   There  are  also  various  implementations  of  the  seven 
bit  code  in  eight  bit  environments,  and  the  method  of  embedment  of 
the  seven  code  bits  within  an  eight  bit  environment  has  caused  much 
debate,  but  a  resolution  appears  to  have  been  achieved,  as  will  be 
discussed. 

The  128  characters  of  ASCII  are  arranged  in  a  collating  sequence, 
with  ML  low  and  DEL  high  in  the  sequence.   The  non-printing  graphic 
character  Space  (SP)  collates  ahead  of  all  other  graphic  characters. 
The  collating  sequence  presents  a  natural  arrangement  for  machine 
sorting  and  file  structuring,  and  it  Is  the  principal  consideration 
in  the  structure  of  ASCII. 

The  ten  decimal  digits  are  positioned  in  order  in  column  3  so  that 
their  value  is  represented  by  the  binary  values  of  the  four  low 
order  bits  bl  through  bU.  Upper  case  and  lower  case  Latin  alphabets 
are  each  located  in  26  contiguous  positions.   The  code  for  an  upper 
case  letter  differs  from  its  corresponding  lower  case  letter  in  only 
one  bit,  b6.   Other  collating  and  character  positioning  considerations 
are  given  in  section  10  and  in  the  standard  (Reference  1), 


3.  Six,  Seven,  and  Eight  Bit  Considerations 
Original  efforts  to  develop  a  standard  code  for  general  information 
interchange,  in  the  early  1960's,  attempted  to  limit  the  code  to  six 
bits,  in  which  there  can  be  only  6U  uniquely  coded  characters.  This 
was  soon  deemed  to  be  inadequate,  although  most  computers  of  that 
era  represented  characters  in  only  six  bits.  Seven  bits  was  deemed 
to  be  the  optimum  size,  with  its  attendant  128  unique  coded  bit 
patterns,  and  that  size  was  selected  for  ASCII.   Eight  bits  was  deemed 
to  be  wasteful  because  of  the  need  to  transmit  the  extra  bit  in  communication 
systems. 

Because  computer  design  favors  eight-bit  groups  in  preference  to 
seven  bits  for  several  cogent  technical  reasons,  an  eight-bit  form 
of  ASCII  may  become  even  more  prevalent  than  the  seven-bit  form 
currently  used  in  several  communication  systems. 

h.     Structure  of  Eight  Bit  USASCII-8 
Placement  of  the  128  ASCII  characters  within  the  framework  of  eight- 
bits,  with  its  2^6  positions,  has  been  a  subject  of  intense  controversy 
in  standardizing  circles  for  several  years,  but  it  appears  to  have 
stabilized  into  Hie  form  shown  in  Figure  2,  in  which  the  ASCII  characters 
are  in  the  left-half  of  the  eight-bit  code  table,  where  the  high 
order  bit  (E8)  is  zero. 

In  ASCII  the  seven  bits  are  designated  bl  through  b7,  while  in 
USASCII-8  the  bits  are  designated  El  through  E8.   In  these  notations, 


"b"  stands  for  "bit"  and  "E"  is  used  for  distinction,  apparently 
having  come  from  the  first  letter  of  the  word  "eight." 

Several  arbitrary  conventions  have  been  evolved  for  the  eight-bit 
code  table  of  Figure  2.   The  table  has  16  rows  designated  0  through 
15,  and  16  columns  designated  with  dual  decimal  digits  00  through 
15.  Positions  in  the  table  are  specified  by  the  column  and  row  number 
separated  by  a  slant  (slash)  symbol.   Thus  the  capital  letter  "T" 
occupies  position  05/U  and  the  control  "Carriage  Return"  (CR)  occupies 
00/13.  The  double  digits  00  through  15  for  column  number  distinguish 
this  notation  from  the  similar  seven-bit  ASCII  notation  which  has 
only  one  digit  for  column  number  (0  through  7). 

Specific  standard  character  assignments  for  the  right  half  of  Figure 
2,  in  which  the  high  order  bit  E8=l,  have  not  yet  been  made,  except 
possibly  for  EO,  the  eight  ones  position,  which  is  immutable.  However, 
it  is  well  agreed  that  columns  08  and  09  should  contain  additional 
control  characters,  designated  for  reference  as  KO  through  K31  and 
that  the  other  six  columns  10  through  1$   should  contain  95  additional 
graphic  characters  and  EO.   The  graphics  designated  as  NO  through  N63 
map  into  the  center  four  columns  of  ASCII.  The  graphics  designated 
GO  through  G30  differ  from  the  lower  case  alphabet  positions  only 
in  bit  E8,  and  hence  are  said  to  map  from  columns  lU  and  1$   into  columns 
6  and  7  if  bit  E8  is  ignored  or  dropped. 


5>.      Preferred  6U-Character  Subset 
The  center  four  columns    (2,3>h,$)   of  ASCII  have  become  recognized  as   a 
preferred  6l4-character  subset,   which  can  be  represented  in  six  bits 
by  dropping  bit  b6,    (or  by  dropping  bits  E6   and  E8  in  USASCII-8). 
This   6L|.-character  subset  can  be   (and  often  is)   represented  in  seven 
bits  and  eight  bits. 

6.     Representation  of  ASCII  in  Six  Bits 
A  six-bit  code  is  constrained  to  6I4  unique   characters.     Sometimes 
it  is  desired  to  represent  the  ASCII  set  of  128  characters  or  even 
more  in  six -bit  systems.      In  the  six -bit  world  of  seven-track  magnetic 
tape,    especially  in  Europe,   the  six  bits  normally  represent  the  6I4. 
characters   shown  in  columns   2,3.?  U>    and  5  of  Figure  1  with  bit  b6 
being  ignored.      (In  library  work,    though,   the  6 -bits  usually  represent 
characters  in  columns   2,3*6,    and  7.)     In  this   situation,   the  character 
"Underline"   in  position  5/15>   serves  as  a  flag  character  whose  meaning 
isr     the  next  six-bit  character  following  the  flag  is  from  one  of  the 
outside  four  columns   of  ASCII,   two  positions  to   the  left  or  right   of 
the  normal  position  from  the  6I4  character  set.      This   scheme  allows 
for  the  representation  of  6k  additional  concepts   as  two-character 
sequences.     A   "real  Underline"  is   represented  as  two  successive 
Underlines,    and  hence   the  character  "Delete"   in  7/15>  cannot  be  represented 
by  this  particular  convention. 


7.  Representation  of  Concepts  Beyond  the  Code 
The  7-bit  ASCII  is  constrained  to  the  128  characters  shown  in  Figure 
1  or  in  the  left  half  of  Figure  2.  Pressure  has  naturally  arisen 
from  many  user  sources  to  allow  for  the  representation  of  concepts 
not  explicitly  representable  in  ASCII.  This  pressure  has  been  noted 
in  6,  7,  and  8  bit  environments.  Several  preferred  solutions  are 
becoming  apparent. 

If  a  system  is  constrained  to  exist  in  seven  bits,  then  additional 
concepts  can  be  provided  with  the  aid  of  four  of  the  ASCII  control 
characters:   "Shift  Out"  (SO)  in  0/1 U,  "Shift  In"  (Si)  in  0/15,  "Escape" 
(ESC)  in  1/11,  and  "Data  Link  Escape"  (DIE)  in  1/0. 
SO  is  used  as  a  precedence  character  to  alter  the  interpretation 
of  succeeding  graphic  characters  from  columns  2  through  7  until  the 
corresponding  SI  character  appears  and  restores  the  normal  ASCII 
interpretation  to  those  graphic  characters.  In  seven-bits,  the 
meanings  of  the  control  characters  in  columns  0  and  1  remains  unchanged 
by  SO  or  SI. 

The  character  "Data  Link  Escape"  (DLE)  in  1/0  is  used  exclusively 
to  alter  the  interpretation  of  immediately  succeeding  ASCII  communication 
control  characters  from  columns  0  and  1  of  Figure  1.   There  are  ten 
of  these  (SOH,  STX,  ETX,  EOT,  ENQ,  ACK,  DLE  itself,  NAK,  SYC\T,  and  ETB). 
All  ten  of  these  are  from  the  top  half  of  columns  0  or  1. 
The  character  "Escape"  (ESC)  in  1/11  provides  the  greatest  flexibility 
and  unlimited  capability  of  any.  A  convention  appears  to  be  evolving 
prescribing  the  use  of  ESC  in  seven  bits,  as  follows:   If  the  character 


ESC  is   immediately  followed  by   any  graphic  character  from  columns  3} 

h,   5,   6,   or  7,   then  that  two-character  sequence  defines   a  new 

concept,   which  may  be  graphic   or  control  in  nature.      If  the   character 

ESC   is   immediately  followed  by  an  appropriate  control  character 

from  columns  0   or  1   (such  as   a  "format  effector,"  a  "device  control" 

or  an  "information  separator"  character)  then  that  two-character 

sequence  defines  a  new  control  concept  which  is  closely  related 

to  the  normal  control  function  of  the  character  following  ESC.  The 

expressions  "locking"  or  "non-locking"  are  commonly  used  in  an 

obvious  manner  to  describe  the  duration  of  the  newly  defined  concept. 

The  "locking"  actions  can  endure  until  an  "unlocking"  action  occurs. 

"Locking"  features  can  be  cascaded  one  over  the  other,  and  a  master 
"restore"  convention  has  not  been  prescribed  as  a  standard. 

Unlimited  possibilities  occur  if  the  character  ESC  is  immediately 
followed  by  a  graphic  character  from  column  2  of  Figure  1,  still 

confining  our  attention  to  seven  bits.  Complete  details  are  yet 

lacking,  but  in  Europe  a  convention  is  evolving  around  the  definition 

of  three-character  Escape  sequences  in  which  the  first  character  is 

ESC,  the  second  character  is  a  graphic  from  column  2,  and  the  third 

is  a  "final"  character,  which  can  be  any  graphic  character  from 

columns  3,   h,   5,   6,  or  7,    or  any  suitably  selected  character  from 

columns  0  or  1.  The  class  of  a  concept  defined  by  a  three-character 

sequence  is  determined  by  the  middle  character,  from  column  2,  and 

the  classification  is  as  follows,  where  "(F)"  represents  a  "final" 

character: 
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Sequence  to  Represent 

ESC  I  (F)  Device  controls 

ESC  "  (F)  Format  controls 

ESC  #  (F)  Sets  of  controls  "K"  in  8  bits, 

columns  08  and  09 

ESC  $  (F)  Control  sets  in  7  bits 

ESC  %   (F)  Shift  Out  sets  in  7  bits 

ESC  &  (F)  Sets  (columns  0  to  7)  in  7  bits 

ESC  '  (F)  Sets  with  more  or  less  than  7  bits 

ESC  -«-  (F)  National  choice  sets  in  7  bits 

Four-character  Escape  sequences  can  be  generated  using  ESC  followed 
by  two  characters  from  column  2  and  a  final  character.  All  four- 
character  Escape  sequences  are  reserved  for  private  use,  not  subject 
to  standardization. 

Having  defined  means  for  representing  an  unlimited  repertoire  of 
concepts  in  seven  bits,  we  now  come  to  the  matter  of  representing 
these  concepts  in  eight  bits  and  of  relating  the  seven  and  eight  bit 
representations . 

8.  Mapping  Between  Seven  and  Eight  Bit  Codes 
Figure  2  is  an  eight-bit  code  table  with  the  left  half  fully  defined 
as  representing  the  128  characters  of  ASCII.   The  right  half  is 
defined  only  as  having  32  controls,  95  graphics  and  the  Eight  Ones  (EO) 
character.   It  is  unlikely  that  the  right  half  will  ever  be  defined 
as  fully  as  the  left  half  for  general  use.   Instead,  it  is  likely  that 
a  family  of  eight  bit  codes  will  evolve,  each  for  a  particular  community 
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of  interest,  representing  such  communities  as  libraries,  typesetting, 
mathematics,  organic  chemistry,  non-Latin  alphabets,  and  many 
others. 

Referring  again  to  Figure  2,  we  can  think  of  the  whole  table  as 
representing  an  eight-bit  code.  Or  if  we  choose,  we  can  ignore 
bit  E8  and  think  of  the  table  as  representing  two  seven  bit  code 
tables,  the  left  one  being  ASCII,  and  the  right  one  containing 
the  Shift  Out  set  of  graphics. 

We  note  that  there  are  9$   graphic  characters  in  ASCII,  including 
Space  (SP)  in  02/0,  and  we  note  that  there  are  9$   graphic  characters 
of  either  the  N-type  or  G-type  on  the  right  half.  Each  of  these 
190  graphic  characters  has  a  unique  bit  code  in  eight  bits,  and  hence, 
in  eight  bits,  they  can  be  interspersed  in  a.  character  stream  in  any 
order,  without  the  use  of  the  shift  characters  Shift  Out  (SO)  or 
Shift  In  (Si).  The  information  conveyed  by  SO  and  SI  (in  seven 
bits)  is  now  inherent  in  the  eighth  bit  E8. 

In  making  a  translation  from  a  seven  bit  character  stream  into  an 
eight  bit  stream,  we  can  monitor  for  SO  and  SI,  use  these  to  set  bit 
E8  for  subsequent  eight  bit  characters,  and  then  either  discard 
or  retain  the  SO  and  SI  characters. 

In  going  from  a  stream  of  eight  bit  graphic  characters  into  a  seven 
bit  stream,  we  can  monitor  bit  E8,  and  when  it  changes,  we  can  insert 
an  SO  or  SI  character  into  the  seven  bit  stream,  to  retain  the 
information  conveyed  by  bit  E8. 
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A  convention  has  been  gaining  favor  relating  the  32  "K"  controls 
of  columns  08  and  09  into  two -character  sequences  in  seven  bits, 
having  ESC  as  the  first  character  and  a  corresponding  graphic  from 
columns  h   or  $   as  the  second  character.  In  this  transform,  the 
six  low-order  bits  of  the  graphic  character  (bl  through  b6)  match 
the  six  low-order  bits  of  the  K  control  (El  through  E6).  In  the 
reverse  process,  going  from  seven  bits  to  eight  bits,  a  two-character 
sequence  comprised  of  ESC  followed  by  a,  graphic  character  from  columns 
k   or  £,  transforms  into  a  single  eight  bit  control  character  in 
columns  08  or  09,   in  a  position  corresponding  to  the  position 
of  the  graphic  character.  Other  multi-character  Escape  sequences 
remain  the  same  in  seven  and  eight  bit  streams. 

9.  Codes  of  More  than  Eight  Bits 

At  the  present  time  there  does  not  appear  to  be  any  particular  interest 
in  conventions  for  representing  characters  in  codes  larger  than  eight 
bits,  although  devices  exist,  such  as  typesetting  mechanisms,  in  which 
more  than  eight  bits  are  used  to  represent  each  character,  and 
coding  schemes  are  used  in  computers  in  which  two  eight -bit  groups 
are  used  to  represent  a  character  or  an  image  of  a  character  (Refer- 
ence 6) . 

10 o  Collating  Sequence  of  ASCII  and  USASCII-8 

The  1963  version  of  the  ASCII  standard  (Reference  3)  disassociates 
ASCII  from  collating  sequence  with  the  sentences  in  section  501„ 
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"The  standard  code  does  not  include  any  redundancy  or  define  techniques 
for  error  control.  Further ,  it  does  not  specify  a  standard  collating 
sequence."  However ,  that  position  is  reversed  in  196$   ASCII  (Refer- 
ence 2)  and  in  the  current  ASCII  (Reference  l)  in  section  6.3  which  reads 
"The  relative  sequence  of  any  two  characters,  when  used  as  a.  basis 
for  collation,  is  defined  by  their  binary  values." 

Figure  2  defines  a  collating  sequence  for  the  25>6  characters  of  the 
eight  bit  code,  with  the  character  NUL  as  the  lowest  in  the  sequence, 
SOH  next,  and  so  on,  with  EO  (Eight  Ones)  being  the  highest 0  The 
graphic  character  Space  (Sp)  collates  lower  than  any  other  graphic 
character. 

Complaints  are  sometimes  heard  that  all  of  the  upper  case  letters 
collate  below  all  of  the  lower  case  letters,  and  that  miscellaneous 
symbols  collate  between  the  upper  and  lower  case  alphabets.  It  is 
obvious  from  Figure  2  that  these  complaints  are  valid.  However, 
the  painstaking  structure  of  ASCII,  in  which  collating  sequence  is 
probably  the  dominant  (though  far  from  the  only)  feature  allows 
for  a  variation  such  as  that  shown  in  Figure  3. 

Figure  3  is  a  rearrangement  of  Figure  2  in  which  bit  E6  has  been 
moved  to  the  lowest  order  position,  and  bits  El  through  E£  are 
moved  up  one  place. 

In  Figure  3,   the  alphabetic  characters  collate  in  the  sequence  A,  a, 
B,  b,  C,  c,  etc.,  with  no  gaps  in  the  sequence  from  A  to  z. 
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The  character  Space  (SP)  collates  lower  than  all  other  characters, 
graphic  as  wall  as  control,  except  for  the  character  Null  (NUL). 
The  numerals  0  through  9  still  collate  in  the  proper  sequence,  but 
they  are  interspersed  with  control  characters  and  other  graphic 
characters  intervene  between  the  numerals  and  the  letters,  as  in  the 
ASCII  collating  sequence. 

At  first  glance  this  appears  to  be  a  highly  advantageous  arrangement 
for  purely  alphabetic  material,  and  such  interspersed  alphabets  have 
accordingly  been  used  in  some  computer  codes.  However,  there  are 
some  highly  subtle  flaws  in  this  collating  sequence,  as  is  vividly 
illustrated  in  Reference  10. 

For  example,  if  a  sort  were  made  on  alphabetic  material  using  the 
interleaved  letter  collating  sequence  of  Figure  3,   then  all  words 
beginning  with  lower  case  letter  'a"  would  follow  all  words  begin- 
ning with  capital  "A",  which  is  not  the  arrangement  usually  desired 
in  listings  of  mixed  upper  and  lower  case  material,  suci  as  those 
found  in  dictionaries,  telephone  directories,  or  library  catalogs. 

iThe  case  distinction  must  be  ignored,  by  ignoring  bit  b6  or  E6, 
for  sorting,  but  the  case  distinction  must  be  retained  for  printing 
and  for  fine  sorts.  More  detailed  consideration  is  given  in 
Reference  11. 

Note  that  there  is  no  bit  reassignment  i^hatever  between  Figure  2 
and  Figure  3.   Only  the  order  of  presentation  is  different,,  But 
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the  interspersed  alphabets  of  Figure  3  may  suggest  some  of  the 
flexibility  inherent  in  ASCII „   By  suitable  logical  bit  manipu- 
lation, one  could  obtain  the  interspersed  alphabets  of  Figure  3 
and  the  contiguous  digits  of  Figure  2. 

11.   Octal  and  Hexadecimal  Representation  of  the  Coded  Characters 

The  column/row  notation  of  Figures  2  and  3  is  readily  convertible 
to  hexadecimal  (libit)  notation.   Single  decimal  digits  0  through 
9  are  used  for  both  column  and  row,  while  10  through  1$   are  usually 
represented  as  single  capital  letters  A  through  F  for  both  column 
and  row. 

The  arrangements  of  the  code  tables  of  Figures  1  or  2  make  translation 
of  the  bit  patterns  into  octal  (3  bit)  characters  quite  an  awkward 
operation,  since  the  bits  are  not  grouped  conveniently  for  that 
purpose,  and  the  octal  characters  do  not  come  out  even  for  either 
seven  or  eight  bits,  although  they  would  for  six  or  nine  bits,  with 
no  leftover  bits. 

When  seven  bit  characters  are  represented  in  octal  form,  three 
octal  digits  are  required  to  represent  each  character,  and  the 
high  order  octal  digit  is  always  either  zero  or  one,  as  shown 
in  the  top  half  of  Figure  h°     In  representing  all  2^6  eight-bit 
characters,  the  high  order  octal  digit  can  be  0,  1,  2,  or  3o  The 
other  two  octal  digits  can,  of  course,  have  any  value  from  0  to  7. 
The  octal  digits  are  shown  explicitly  in  Figure  h   instead  of  column 
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and  row  numbers „  The  code  table  arrangement,  of  Figure  k,   was  sug- 
gested by  Crosby  (Reference  7) ° 

12.  Three  Evolutionary  Forms  of  ASCII 

The  seven  bit  ASCII  (American  Standard  Code  for  Information  Interchange) 
has  existed  in  its  evolution  in  three  distinct,  but  closely  related 
versions o  The  first  was  known  as  American  Standard  X3oli  -  1963,  approved 
by  the  American  Standards  Association  on  June  17,  1963  (Reference  3). 
It  did  not  contain  lower  case  letters,  nor  the  characters  Backspace, 
Underline,  and  several  others  which  have  been  added  subsequently 0 

The  second  version  was  an  extensive  revision  approved  by  ASA  as 
X3oii  -  1965  but  never  distributed  to  the  public  due  to  pending 
international  modifications  in  the  ISO  7-bit  code,,  However,  the  1965> 
version  was  published  as  a  proposed  revision.  (Reference  2)  along 
with  an  exposition  on  the  evolution  of  the  ISO  7-bit  code. 

The  present  version  is  known  as  the  United  States  of  America  Standard 
Code  for  Information  Interchange,  X3.U  -  1967 ^  approved  by  the  United 
States  of  America  Standards  Institute  (successor  to  ASA)  on  July  7, 
1967.  This  version  was  adopted  as  a  Federal  Standard  by  President 
Lyndon  B.  Johnson  in  a  memorandum  for  the  Heads  of  Departments  and 
Agencies,  dated  March  11,  1968.  The  latest  version  was  published 
in  1968  as  a  combined  ASCII  and  Federal  Information  Processing 
Standard,  FIPS  -  1  (Reference  12). 
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The  evolution  of  ASCIIs  now  also  called  USASCII,  was  influenced 
heavily  by  concurrent  developments  of  international  7-bit  codes, 
particularly  the  ISO  7-bit  code  (Reference  h)   and  the  CCITT  Alphabet 
No.  5,  which  is  destined  to  become  the  successor  to  CCITT  Alphabet 
No.  2,  the  international  five -bit  Baudot  code,  which  is  still  in 
extensive  but  diminishing  use. 

The  1965  version  of  ASCII  represented  a  major  renaming  and  repositioning 
of  the  control  functions  from  those  originally  appearing  in  the  1963 
standard.  The  graphic  characters  "Up  arrow"  and  "Left  arrow"  and  the 
control  character  "RU"  (Are  you.„.?)  were  removed  and  replaced  by 
other  characters.  The  character  "@"  (Commercial  At)  was  moved 
from  U/0  to  6/0,  but  because  of  many  complaints,  it  was  moved  back 
to  U/0  in  the  1967  ASCII.   "Commercial  At"  was  originally  moved  to 
make  room  for  the  character  "Underline"  in  the  preferred  center  four 
columns.  In  a.  CCITT  meeting  the  Russian  delegate  requested  that 
Underline  be  moved  from  I4/O  to  5/15  so  as  to  pair  with  Delete,  and 
make  room  for  31  Cyrillic  alphabetic  characters  in  both  upper  and 
lower  case,  displacing  all  of  the  characters  from  columns  U,   5,  6,  and 
7  except  Underline  and  Delete,  as  shown  in  the  right  half  of  Figure  5. 

13.  National  Versions  of  the  Seven  Bit  Codes 

The  Seven  Bit  Code  of  the  International  Organization  for  Standardization 
(Reference  h)   and  the  nearly  identical  international  communications 
code  of  the  International  Consultative  Committee  on  Telegraph  and 
Telephone  (CCITT)  contain  eleven  code  table  positions  which  are 
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reserved  in  varying  degrees  for  "National  Use."  These  eleven 
positions  are  2/3,  U/0,  5/11,  5/12,  5/13,  5/lU,  6/0,  7/11,  7/12, 
7/13,  and  7/lU.  Position  2/3  is  known  to  contain  only  the  number 
sign  (#)  and  the  symbol  for  "Pound  Sterling."  All  of  the  other 
positions  contain  a  variety  of  symbols.  ASCII  has  assigned  the 
United  States'  preferred  symbols  to  these  positions,  and  it  is 
noted  that  many  other  nations  have  assigned  the  same  symbols  where 
there  was  no  other  need.  Some  of  the  ASCII  national  use  symbols  are 
shown  as  preferred  assignments  in  the  ISO  code. 

lU .   The  Russian  Code 
In  a  USSR  State  Standard  (Reference  8)  the  Russians  have  changed 
"Shift  Out"  to  "Russian  Register"  and  "Shift  In"  to "Latin  Register. " 
In  the  Russian  register,  they  have  placed  uppercase  Cyrillic  characters 
in  columns  corresponding  to  lowercase  Latin  letters  as  shown  in  Figure 
5,  a  seven  bit  code.  Only  31  of  the  usual  33  Cyrillic  letters  are 
included.  Omitted  is  the  silent  symbol  which  resembles  "bl"  aid  the 
"E"  or  "e"  with  the  double-dot  (diaeresis)  diacritical  overmark.   The 
Cyrillic  letters  are  arranged  to  match  their  Latin  phonetic  equivalents, 
rather  than  being  placed  in  alphabetical  order.   This  facilitates  Latin- 
Cyrillic  keyboard  pairing,  but  complicates  alphabetical  sorting  by  computer 
of  Cyrillic  data. 

The  placement  of  uppercase  Cyrillic  letters  in  columns  corresponding 
to  lowercase  Latin  letters  was  done  to  facilitate  a  third  seven-bit  code 
arrangement,  not  illustrated,  which  has  all  lowercase  letters  removed, 
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and  both  Latin  and  Cyrillic  uppercase  letters  in  their  usual  positions. 
This  permits  monocase  operations  with  both  Latin  and  Cyrillic  alphabets 
without  the  use  of  Shift  Out  and  Shift  In  characters. 
Use  of  the  Russian  code  is  explained  in  more  detail  in  Reference  9- 
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